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1  Project  Objectives 

1 . 1  Technical  gaps 

FANTOM  represents  a  novel  paradigm  shift  in  design  and  development  of  em¬ 
bedded  signal  and  image  processing  (ESIP)  systems  on  chip,  with  high  through¬ 
put  and  low  power  consumption  (HL)  in  performance  as  well  as  with  high 
turnaround  rate  and  low  overall  cost  (HL)  in  production. 

There  were  a  few  large  technical  gaps,  to  come  across,  in  design  and  devel¬ 
opment  of  ESIP  systems,  which  are  particularly  important  to  defense  research 
and  applications. 

□  the  gap  between  increasing  demands  for  ESIP  systems  and  the  lack  of 
means  to  meet  the  demands; 

□  the  gaps  in  multiple  aspects  between  general  purpose  systems  with  com¬ 
modity  software  and  hardware  on  the  one  side  and  embedded  systems 
with  customized  hardware  and  software  on  the  other  side: 

(+)  performance:  latency,  accuracy,  power  consumption; 

(+)  form:  size,  weight; 

(+)  resource:  area  in  particular 

(— )  development  cycle  (turnaround)  for  customized  applications; 

(— )  adaptability  for  updating  algorithms  or  applications  or  both; 

(— )  complexity  in  design  and  development; 

where  ESIP  systems  were  known  to  have  the  potential  in  the  aspects 
marked  by  (+),  and  face  the  challenges  in  other  aspects  marked  by  (— ). 

□  the  gap  in  knowledge  and  experience  among  the  designers  and  developers, 
namely,  the  expert,  the  outmoded  and  the  novice; 

The  long  cycle  in  design  and  development,  for  example,  had  been  a  key  factor, 
among  others,  responsible  for  the  lack  of,  or  slow,  updates  in  many  ESIP  systems 
important  or  critical  to  defense  applications. 

Our  ultimate  goal  was  to  narrow  down  and  bridge  these  gaps.  This,  we 
argued  in  our  proposal,  can  be  achieved  by  providing  a  semi-automatic  or  au¬ 
tomatic  platform  to  aid  the  design  of  ESIP  systems.  The  term  codesign  used 
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in  the  project  title  implies  also  communication,  connection  and  collaboration 
between  the  researchers  in  different  expertise  areas,  namely,  hardware  design, 
software  design,  modeling,  analysis,  optimization  and  algorithms,  and  domain- 
SIP  applications. 

1.2  Milestone  statements 

Phase-I. 

Deliver  an  algorithm-architecture  co-design  platform  for  production 
of  hardware  accelerators,  hosted  by  a  personal  computer,  for  geo¬ 
metrically  structured  matrix-vector  products.  The  design  will  be 
competitive  in  time-power-area  performance  to  existing  ones.  The 
turnaround  time  between  a  design  specification  and  a  customized 
design,  by  a  senior  EE  undergraduate  student,  is  (substantially)  less 
than  10%  of  the  time  for  manual  tuning  by  an  expert  team.1 

Phase-II. 

Deliver  an  algorithm-architecture  co-design  platform  for  production 
of  application-on-a-chip  (AoC),  to  be  embedded  in  a  custom  sys¬ 
tem,  for  geometrically  structured  matrix-vector  products.  The  de¬ 
sign  will  be  competitive  in  time-power-area  performance  to  existing 
ones.  The  turnaround  time  between  a  design  specification  and  a 
customized  design,  by  a  senior  EE  undergraduate  student,  is  (sub¬ 
stantially)  less  than  10%  of  the  time  for  manual  tuning  by  an  expert 
team. 

The  FANTOM-I  system  is  an  accelerator,  driven  by  a  host  machine.  Figure  1 
is  an  initial  and  simple  visual  display  of  the  FANTOM-I  research  mission.  The 
FANTOM-II  system  is  an  autonomous  system,  which  we  describe  in  the  next 
section.  We  will  also  describe  in  Section  2  the  significant  forward  leap  from 
Phase-1  to  Phase-II. 

A  couple  of  remarks  are  in  order.  We  envisioned  our  mission  beyond  the 
stated  milestones  and  we  indeed  far  surpassed  them.  The  milestone  statements 
above  understated  the  objectives  in  a  sort  of  news  reporting  language,  which 
reflect  the  main-stream  management  style  at  DARPA  back  then. 

1This  milestone  was  revised  by  Dr.  C.  Schwartz  when  he  took  over  the  DESA  program 
from  Dr.  D.  Cochran. 


5 


Framework  for  Accelerating  Numerical  Transforms  On  Microchips 


ease  of  use 


ECE  student  1 
DARPA  manager! 


FANTOM  Automation 
I  Translation  from  console 

j  Forward  translation  \  \ 

!  algorithm  ->  architecture 

j  Backward  translation  Y‘ 
j architecture ->algorithms  j 


:'.X 


f*  application  program  on  a  PC  */ 


WHILE  (  Iteration-condition  =  T ) 
k=  k+  1 

uk+l  =  A  X  x(uk) 


END 


Fast  Simulation 
Adaptive  processing 


codification  and  automation 
->  quick  turnaround 


Figure  1:  FANTOM  I:  high-level  concept  diagram. 

1.3  The  FANTOM  system 

The  FANTOM  project  successfully  rendered  a  working  platform  for  AoC  design, 
by  the  end  of  Phase-II.  The  platform  system  encodes  and  embodies  our  research 
results  as  a  whole.  It  integrates  vertically  across  layers  of  dependency,  as  much 
as  possible,  from  the  basic  operations  at  the  bottom,  to  the  user  interface  at  the 
top.  It  also  integrates  horizontally,  as  much  as  possible,  between  conceptually 
compartmentalized  modules  at  each  layer.  In  addition,  the  platform  is  designed 
to  be  adaptable,  with  regards  and  in  response  to  the  rapid  advances  in  computer 
technology,  computing  techniques,  and  increasing  demands  or  desires  of  new  SIP 
applications. 

We  introduce  in  Section  3  the  major  research  results  that  underlie  the  sys¬ 
tem,  and  how  the  accomplishments  are  measured. 


2  Technical  Problems 

We  describe  in  this  section  the  major  technical  problems  we  were  to  overcome. 
Most  of  the  problems  lied  across  the  boundaries  between  traditional  disciplinary 
studies  as  well  as  those  between  stages  and  modules  in  traditional  system  design 
and  development.  The  problems  are  fundamental  and  hard,  many  of  them  were 


6 


not  previously  addressed. 


2.1  Mutual  algorithm-architecture  constraints 

There  are  constraints  imposed  by  architectures  on  algorithms  and  vice  versa. 
For  example,  sparse  and  irregular  data  structures  are  a  hallmark  of  ’smart’ 
algorithms  with  low  arithmetic  complexity  (linear  or  nearly  linear  complexity) 
in  free  space.  Operations  with  such  data  structures  were  not  well  supported  by 
modern  architectures  in  hardware,  which  favor  regularly  strided  data  accesses 
and  operations,  such  as  array  operations,  and  add  tremendous  extra  cost  in 
implementation  and  execution  of  the  smart  algorithms. 

In  general  computational  practice,  we  must  deal  with  the  following  discrep¬ 
ancies 

o  in  data  locality  :  memory  hierarchy  vs.  algorithm  hierarchy 

The  memory  hierarchy  is  in  principle  organized  for  the  benefit  of  temporal 
data  reuse  and  spatial  data  reuse.  A  tree-like  algorithm  hierarchy  poses 
a  great  challenge  on  data  reuse  both  temporally  and  spatially,  because 
the  temporal  extension  of  a  cluster  of  data  requires  the  use  of  other  data 
clusters.  See  Figure  2. 

In  addition,  the  memory  hierarchy  is  regularly  structured,  while  the  algo¬ 
rithm  hierarchy  may  be  highly  irregular  depending  on  the  data  distribu¬ 
tion. 

o  in  parallelism:  dependency-concurrency  of  an  algorithm  in  free  space  vs. 
parallel  operations  and  patterns  supported  or  favored  by  architectures. 
Again,  depending  on  the  data  distribution,  the  particular  dependency- 
concurrency  graph  of  algorithm  execution  may  be  highly  irregular. 

2.2  Cutting-edge  application  requirements 

The  application  requirements  may  be  described  in  terms  of  accuracy,  latency, 
power  consumption,  and  resource  (area)  consumption.  Often,  the  requirements 
seem  conflicting  and  infeasible  with  the  existing  techniques.  These  requirements 
drive  the  research  to  new  frontiers. 

Among  other  important  and  influential  applications,  we  used  the  SAR  image 
formation  as  a  case  study.  Figure  3  depicts  a  key  processing  component  for  SAR 
image  formation.  High-efficiency,  high-resolution  image  formation  remains  a 
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FMM  Multi-level  Model 


Figure  2:  Mapping  between  Algorithm  Hierarchy  and  Memory  Hierarchy 


significant  challenge  for  modern  power/volume  constrained  weapons  systems. 
It  requires  broad-band  frequency  ranges  and  high  image  formation  speed  but 
with  data  sampled  on  non-Cartesian  grids. 

2.3  Hardware- Software  (HS)  partitioning  on  AoC 

An  AoC  system  includes  FPGAs  and  CPUs.  The  most  challenging  task  in 
mapping  an  SIP  application,  such  as  an  SAR  application,  lies  in  partitioning 
the  computation  tasks  between  the  reconhgurable  fabric  and  multiple  CPU 
cores.  The  CPUs  may  be  internal  or  external  to  the  FPGA.  Furthermore,  when 
internal,  the  CPUs  may  be  software  cores  or  hard  cores.  See  Figure  4. 

2.4  Optimization-Automation  co-dependence 

Modeling,  estimation  and  optimization  were  conventionally  restricted  to  each 
operation  module  and  often  carried  out  manually,  by  expert  designers.  Joint 
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Figure  3:  2D  Translation  and  Transformation  for  SAR  image  formation,  provided 
by  research  personnel  at  the  Radar  Signal  Processing  Center  at  Raytheon  Missile 
Systems 

optimization  at  each  integration  stage,  and  across  the  stages,  is  hard  and  often 
skipped.  The  lack  of  automation  in  practice  was  partially  due  to  the  lack  of 
modeling,  analysis  and  optimization.  On  the  other  hand,  over  a  complex  design 
space,  optimization  must  be  assisted  by  automation. 

This  includes,  for  example,  the  joint  optimization  in  accuracy  and  efficiency. 
Accuracy  is  affected  by  numerical  ranges  in  input,  output  and  intermediate  data, 
and  data  representation  on  an  AoC  (format  and  precision).  In  conventional 
ESIP  system  design,  the  fixed-point  representation  with  few  bits  was  much 
preferred  for  efficiency. 

Conventional  methods  confined  the  accuracy  and  throughput,  for  instance, 
in  a  tight  trade  off  space.  A  higher  throughput  was  to  be  traded  off  with  a 
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Figure  4:  Hardware  and  Software  Partitioning  and  Codesign, 
lowered  accuracy. 

2.5  Leap  forward  from  Phase-I  to  Phase-II 

We  highlight  in  Figure  5  the  major  changes  from  the  first  to  the  second  gen¬ 
eration  of  FANTOM  system,  broken  down  in  terms  of  the  system  components. 
In  the  rest  of  the  section  we  elaborate  on  the  challenging  issues  in  FANTOM-II 
and  our  approaches  to  the  solution. 

3  Major  Research  Contributions 


We  have  identified  in  Section  2  the  major  gaps  and  the  fundamental  technical 
problems  to  overcome.  In  this  section,  we  describe  briefly  a  few  major  scientific 
results,  inventions  and  engineering  accomplishments  by  the  research  project. 
In  other  words,  we  describe  the  conceptual  and  engineering  components  the 
FANTOM  system  is  based  upon  and  made  of.  We  annotate  each  item  with  the 
development  Phase,  I  or  II,  in  which  it  was  completed.  Most  of  the  details  can 
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Figure  5:  Advances  from  FANTOM-I  to  FANTOM-II  with  Major/Medium/Minor 
Changes  Color-Coded  in  Blue/Green/ Yellow,  Respectively. 

be  found  in  the  FANTOM  bibliography,  where  the  most  closely  related  works 
by  other  researchers  are  cited. 

3.1  Specification  abstraction  &;  expansion 

Conventionally,  an  AoC  design  specification  is  provided  in  code  in  a  program¬ 
ming  language  such  as  C.  We  refer  to  such  method  as  the  code  specification. 
First,  a  code  specification  is  subject  to  the  expression  scope  of  the  language,  the 
programmer’s  understanding  of  the  computation  task  as  well  as  the  program¬ 
mer’s  capability  to  describe  the  task  with  his  or  her  command  of  the  language 
usage.  Next,  the  code  specification  is  instantiated  in  hardware  instruction  by 
instruction,  basically.  A  translation  word  by  word,  phrase  by  phrase,  is  not 
even  a  good  approach  for  natural  language  without  regard  to  the  context  and 
structure.  In  other  words,  the  code  specification  and  instantiation  approach  is 
the  first  spot  that  not  only  narrows  down  the  design  space  but  also  deforms  it 
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most  likely. 

We  broke  this  conventional  barrier  right  at  the  beginning.  We  specify  the 
application  algorithm  in  a  high-level  description,  in  particular,  in  MATLAB  with 
FANTOM  annotations.  No  architectural  constraints  are  imposed  unnecessarily  in 
task  specification.  We  then  expand  the  specification  at  different  levels  of  detail. 

3.2  Chiplet  library  &;  chip  design  hierarchy 

Exploiting  the  very  same  idea  behind  software  libraries,  we  developed  the  chiplet 
library,  which  consists  of  parametric  chip  models  for  the  basic  and  primitive 
operations.  The  importance  of  the  library  becomes  evident  during  the  project 
period,  in  which  the  reconfigurable  fabric  hardware  was  changed  and  updated 
multiple  times.  These  changes  at  the  very  bottom  of  the  system  design  did  not 
crash  and  collapse  our  system  thanks  to  the  chiplet  library  we  had  abstracted 
and  established  first.  Most  of  the  changes  will  be  located  at  the  chiplet  levels. 
There  are  two  types  of  chiplets,  basic  operations  (bricks)  and  basic  compositions 
(mortar).  In  Phase-I. 

3.3  Iterative  forward-backward  mapping 

We  established  a  formal  description  of  the  algorithm-architecture  codesign  space 
in  terms  of  forward  and  backward  mappings,  a  systematic  modeling  framework 
for  the  two-way  mappings,  and  an  efficient  and  effective  approach  for  efficient 
and  effective  performance  estimation  and  adaptive  search  for  optimal  design(s), 
in  Phase-I. 

For  instance,  the  mapping  between  the  potential  concurrency  in  the  fast  mul¬ 
tipole  method  (FMM)  and  the  parallel  patterns  favored  by  hardware  structures 
(Figure  2)  takes  a  few  iterative  steps. 

3.4  Novel  instruction-function  generation 

We  invented  and  developed  a  novel  and  unique  FPGA  design  system  that  en¬ 
ables  recursive  and  nested  generation  of  functions  modules  with  increasing  com¬ 
plexity,  within  the  restricted  resources  on  FPGAs.  Designed  and  developed  in 
Phase-II. 

This  recursion  idea  and  technique  allows  model  functions  constructed,  gen¬ 
erated  and  encapsulated  shell  by  shell,  with  the  same  fixed  and  very  limited 
amount  of  resource  on  an  FPGA.  Roughly  speaking,  a  function  at  the  inner 
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Figure  6:  FANTOM  special-purpose  (SP)  compiler  to  automate  the  genera¬ 
tion  of  lookup-table  based  implementations  for  convolution  kernel  functions, 
under  accuracy  and  resource  constraints  and  two  architectural-aware  geometric- 
tiling-based  parallelization  strategies  for  convolutions,  targeting  both  uniformly- 
distributed  and  non-uniformly  distributed  input  samples. 


shell  can  be  used  as  if  it  were  a  native  instruction  to  the  composition  at  the 
next  level.  If  this  might  sound  similar  to  a  nested  software  design,  think  about 
putting  a  strong  memory  limitation  on  the  entire  instruction  set. 

3.5  Special-purpose  mini-compilers 

Most  of  the  above  is  enabled  and  automated  with  our  compiler  techniques, 
We  developed  special-purpose  compilation  techniques  in  multiple  stages  for 
translating,  interpreting,  transforming,  and  mapping  from  application-specific 
algorithm  specification  at  a  graphical  user  interface,  all  the  way,  to  a  high- 
performance  instantiation  on  FPGAs.  For  accelerators  in  Phase-I,  for  systems- 
on-chip  in  Phase-II. 

3.6  Processing  irregular  data  on  regular  architecture 

We  developed  the  very  first  system  design  platform  that  permits  modern  SIP 
applications  with  irregular  sampling,  a  significant  leap  from  simulated  or  post- 
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processing  studies.  As  mentioned  in  Section  2.2,  this  is  important  to  SAR  image 
formation.  See  Figure  7.  The  initial  module  is  for  accelerators  in  phase  I,  it  is 
extended  to  systems-on-chip  in  Phase  II. 


Figure  7:  Multi-resolution  image  formation  by  adaptive  formation  from  non- 
uniformly  sampled  data  (simulation  result) 


3.7  Shifting  accuracy-efficiency  trade-off  boundary 

We  established  a  distinct  methodology  for  co-optimization  in  accuracy  and  speed 
performance  (subject  to  resource  constraints),  which  shifts  the  trade-off  bound¬ 
ary  well  beyond  the  traditional  ones,  in  Phase-I. 

For  example,  we  may  represent  a  filter  kernel  and  filtered  data  in  two  parts. 
We  use  economic  representation  for  the  first  part.  Specifically,  we  make  a 
combined  use  of  low-bit  data  representation  and  lookup  tables.  The  low-bit 
representation  includes  fixed-point  representation,  or  blockwise  fixed-point  rep¬ 
resentation,  or  customized  low-bit  floating-point  representation.  We  then  use 
fast  processing  during  execution  to  get  the  second  part  and  reach  the  required 
accuracy. 

3.8  System  rendering 

We  produced  a  video  screencast  to  introduce,  demonstrate  and  give  a  tutorial 
on  the  system.  S.  Kestur,  a  graduate  student  then,  narrated  in  his  gentle  voice 
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GRAPE-6 

FANTOM-I 

Device  type 

ASIC 

FPGA(XC2VP100-6) 

Device  technology  (/r  m) 

0.25 

0.13 

PEs/chip 

6 

3 

Frequency  (MHz) 

90 

125 

Real  Peak  (GFLOPS) 

17.2 

22.2 

Power  Consumption  (W) 

12 

6.5 

Table  1:  Performance  comparison  with  GRAPE-6 


step  by  step  how  to  use  the  system.  The  late  manager,  Dr.  Healy,  was  very 
pleased  with  it. 

3.9  Accommodation  of  COTS  products 

In  addition  to  the  above  efforts  and  accomplishments,  we  have  also  envisioned 
and  engaged  in  FANTOM  research  on  accommodation  of  emerging  commercial- 
off-the-shelf  (COTS)  products.  The  COST  products  include  game  or  graphics 
processors  by  IBM,  AMD,  and  nVIDIA.  They  are  between  general-purpose  pro¬ 
cessors  and  special-purpose  processors.  In  Phase  II. 

3.10  Measure  of  success 

FANTOM  accomplishments  have  been  measured  in  three  different  ways. 

First,  as  promised  in  the  milestone  statements,  we  carefully  set  up  a  compar¬ 
ison  environment  at  the  Department  of  Electrical  and  Computer  Engineering, 
PSU.  The  comparison  results  were  as  expected  and  reported  in  DARPA  reviews, 
in  Phase-I  and  Phase-II. 

Next,  we  took  the  conventional  benchmarking  approach  and  made  a  com¬ 
parison  to  GRAPE-6,  in  Phase-I,  to  the  best  special-purpose  computer,  made  in 
Japan  and  supported  fully  by  the  Japanese  government,  for  molecular  dynamics 
simulation,  See  Table  1. 

Finally,  more  excitingly,  directly  and  remarkably,  the  FANTOM  system  and 
the  methodology  have  been  employed,  and  hence  tested,  since  2010  by  other 
research  and  development  projects,  at  DARPA  and  elsewhere.  In  particular, 
three  research  projects  under  the  NeoVision2  program  at  DARPA  employed  the 
FANTOM  system  and  methodology.  Each  project  team  had  their  own  ESIP 
design  experts.  With  the  aid  of  the  FANTOM  system  and  methodology,  each 
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of  the  teams  reached  a  reduction  in  power  consumption  by  a  factor  of  O(103) 
and  a  faster  processing  speed,  without  compromising  accuracy.  We  describe  in 
Section  4  more  of  FANTOM  research  translation  and  impact. 

4  Research  Translation 

4.1  Direct  deployment  and  impact 

More  directly,  the  FANTOM  platform  and  methodology  have  been  employed 
and  applied  to  multiple  projects  at  DARPA  and  elsewhere,  in  particular,  by  the 
following  three  research  teams  under  the  NeoVision2  program  at  DARPA 

o  M.  Peot’s  team  at  Teledyne  Scientific  Imaging,  Inc. 

o  L.  Itti’s  team  at  the  Univ.  of  Southern  California 

o  D.  Khosla’s  team  at  HRL  Laboratories,  LLC. 

Other  research  groups  at  the  Office  of  Naval  Research  and  at  Intel  Co. 
have  been  interested  in  exploring  with  and  exploiting  FANTOM  methodologies. 
Three  of  the  co-PIs  have  been  approached  by  Intel  researchers  for  potential 
collaborations  on  commercial  applications. 

4.2  Community  recognition 

In  addition  to  the  publications  listed  in  the  Bibliography  section,  all  co-PIs  have 
given  FANTOM  talks  at  various  conferences.  In  particular, 

o  Co-PI  V.  Narayanan  has  been  highly  visible  and  influential  for  his  major 
contributions  in  FANTOM  project,  among  his  other  projects.  He  was  invited 
as  a  keynote  speaker  at  many  conferences,  including 

-  International  Symposium  on  High  Performance  Computing  Architec¬ 
ture,  Jan  2010 

-  FETCH  2012,  Alpe  dHuez,  France, 

-  VLSI  Design  Conference,  January  2013,  Pune  India 

-  4th  Workshop  on  SoCs,  Heterogeneous  Architectures  and  Workloads, 
(SHAWM).  February  24th  2013,  Shenzhen,  China. 
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Nihshanka  Debroy 

MS 

Dcloitte  Consulting 

Duke 

Paolo  Bientinesi 

postdoc 

RWTH  Aachen  University 

Tian  Xiao 

postdoc 

Wave  Computation  Technologies 

Lanping  Deng 

PhD 

Oppo  Digital 

ASU 

Chi-Li  Yu 

PhD 

Marvell  Semiconductors 

Kanwaldeep  Sobti 

MS 

AMD 

Jungsub  Kim 

PhD 

Samsung 

Prasanth  Mangalagiri 

PhD 

Intel  Co. 

PSU 

Kevin  Irick 

PhD 

Silicon  Scapes  (founder  &  CEO) 

Yuanrui  Zhang 

PhD 

Intel  Co. 

Srinidhi  Kestur 

PhD 

Intel  Co. 

Sungho  Park 

PhD  candidate 

Tabic  2:  Placement  of  former  students  and  postdocs  partially  or  fully  supported 
by  FANTOM 


-  Workshop  on  Neuromorphic  and  Brain-Based  Computing  Systems  (Neu- 
Cornp  2013),  Grenoble  ,  March  2013 

o  Co-PI  M.  Kandemir,  and  his  students,  won  the  best  paper  award  at  Interna¬ 
tional  Parallel  and  Distributed  Processing  Symposium,  2008. 

o  Two  conference  papers  on  project  FANTOM  got  special  invitations  as  con¬ 
tributed  papers  in  special  issues  of  high-influence  journals. 

All  FANTOM  students  and  postdocs  were  recruited  by  highly  competitive 
research  institutes  or  groups,  see  Table  2  for  the  placement.  In  particular,  Dr. 
Kevin  Irick  started  a  company  Silicon  Scapes  and  he  has  been  the  CEO. 


5  Implications  for  Related/Future  Research 

We  speculate  that  more  FANTOM-like  methodologies  are  needed  in  the  next 
decade,  at  least,  for  developing  small,  smart  and  special-purpose  ESIP  systems, 
considering  the  following  factors. 

o  At  the  time  project  FANTOM  started,  there  was  no  iPhone.  The  emerge 
of  iPhone  in  June  2009  was  in  the  finishing  days  of  the  FANTOM  project. 
Developers  and  researchers  at  large,  began  to  realize,  not  much  ahead  of 
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the  populace,  the  importance  and  great  potential  with  small,  smart  and 
special-purpose  ESIP  systems. 

o  The  approach  to  the  limitation  of  Moore’s  law  implies  that  we  must  not 
continue  the  rich  man’s  way  to  use  hardware  resources  with  ever  swelling 
software  packages. 

o  While  mobile  computing  devices  are  enabled  by  AoC  techniques,  mobile 
communicating  techniques  will  expand  and  advance  the  AoC  techniques 
for  larger  and  larger  scale  processing. 

6  Additional  Comments  (intended  to  program  managers) 

FANTOM  research  results  are  remarkable,  especially,  under  the  following  con¬ 
ditions.  A  small  team  (5  co-PIs,  2  postdoctoral  years,  10  graduate  students, 
in  total),  a  short  time  span  (05/2005-02/2010,  including  no-cost  extension  of  6 
months),  and  a  modest  budget.  Most  of  the  FANTOM  publication  was  done 
in  the  late  stage  of  Phase-II,  not  only  because  of  the  fruitful  results  but  also 
thanks  to  a  welcome  change  at  DARPA  from  the  extreme  product  driven,  dead¬ 
line  driven  style  during  the  earlier  years  of  FANTOM  project. 

In  the  4-year  project  duration,  the  project  management  of  FANTOM  was 
passed  among  the  hands  of  4  program  managers  at  DARPA.  We  the  co-PIs 
thank  Dr.  D.  Cochran  for  having  the  initial  vision  and  putting  the  researchers 
in  different  research  areas  together.  We  thank  Dr.  C.  Schwartz  for  managing 
the  project  with  professional  appreciation  and  intense  passion.  We  remember 
late  Dr.  D.  Healy,  who  passed  away  in  Sept.  2009,  with  absolute  respect  and 
admiration  for  his  insightful  and  gentle  guidance  and  inspiration,  for  his  dedi¬ 
cation  to  scientific  research,  to  DARPA  and  to  researchers  on  DARPA  projects. 
We  thank  Dr.  A.  Kane  for  taking  the  position  left  behind  by  Healy  a  year  after 
reviewing  our  final  project  report  in  Jan.  2011. 

At  the  test  and  validation  stage  of  the  project.  FANTOM’s  partner  at 
Raytheon  quit  from  his  company  and  hence  from  this  project.  Fortunately, 
FANTOM  was  tested  in  other  and  perhaps  more  effective  ways,  see  Section  3. 

Many  end-products  of  DARPA  projects  fall  into  the  category  of  embedded 
systems,  The  FANTOM  research  results  can  be  applied  to  more  and  on-going 
DARPA  research  projects,  if  the  results  are  broadly  introduced  via  program 
managers.  We  have  seen  and  heard  of  some  projects  struggling  with  FPGA 
implementation  by  learning  from  scratch. 
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We  have  developed  the  FANTOM  system  and  methodology  and  provided 
important  and  critical  assistance  to  broader  application  system  designs.  We 
also  emphasize  the  importance  of  maintaining  the  leading  researchers  and  ac¬ 
tive  research  in  ESIP  system  design  and  development.  Systems  must  be  updated 
or  transformed  or  replaced  one  way  or  another,  sooner  or  later.  Through  con¬ 
tinuously  active  and  advanced  research  activities,  we  make  new  advances  and 
foster  a  new  generation  of  researchers.  In  this  aspect,  the  most  important  FAN¬ 
TOM  result  is  the  FANTOM  students,  who  are  hot  recruit  targets  and  new 
entrepreneurs. 

This  report  is  requested  by  ARO  for  the  official  closure  in  paper  work  of  the 
research  project.  Because  of  the  loss  of  some  FANTOM  data  and  files  due  to  a 
failed  workstation  in  2011,  the  authors  took  extra  time  and  efforts  to  reconstruct 
this  report  from  the  remaining  and  published  material.  2 
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